Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations
نویسندگان
چکیده
Accuracy of speaker verification is high under controlled conditions but falls off rapidly in the presence of interfering sounds. This is because spectral features, such as Mel-frequency cepstral coefficients (MFCCs), are sensitive to additive noise. MFCCs are a particular realization of warped-frequency representation with low-frequency focus. But there are several alternative, potentially more robust, warped-frequency representations. We provide an experimental comparison of five warped-frequency features. They use exactly the same frequency warping function, the same number of coefficients and postprocessing, but differ in their internal computations. The compared variants are (1) conventional MFCCs from discrete Fourier transform (DFT), followed by Mel-scaled filterbank, (2) MFCCs via direct warping of DFT, followed by linear-scale filterbank, (3) warped linear prediction features, (4) perceptual minimum variance distortionless features and (5) recently proposed sparse Mel-scale histogram features. Experiments carried out on a subset of the SRE 10 corpus using a scaled-down i-vector system indicate that direct DFT warping outperforms conventional MFCCs in most of the cases.
منابع مشابه
Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics o...
متن کاملWavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models
To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral...
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملExploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition
An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coeÆcients (STCC) and mel-scale lter bank cepstrum coeÆcients (MFCC) on a telephone based connected digit recog...
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کامل